Skip to content

Conversation

@Qard
Copy link
Contributor

@Qard Qard commented Jan 12, 2026

This change allows users to configure which model to use as the default for all evaluations, replacing the hardcoded gpt-4o default.

Changes:

  • Add defaultModel parameter to init() in both JS and Python
  • Add getDefaultModel() function to retrieve configured default model
  • Update LLMClassifier and RAGAS scorers to use configurable default model
  • Update documentation with examples for different use cases

This enables:

  • Using different OpenAI models (gpt-4-turbo, o1, gpt-3.5-turbo, etc.)
  • Using non-OpenAI models via Braintrust proxy (Claude, Gemini, Llama, etc.)
  • Configuring once and having all evaluators use the preferred model

Example usage:

init({
  client: new OpenAI({
    apiKey: process.env.BRAINTRUST_API_KEY,
    baseURL: "https://api.braintrust.dev/v1/proxy",
  }),
  defaultModel: "claude-3-5-sonnet-20241022",
});

Fixes #136

@Qard Qard requested a review from ibolmo January 12, 2026 23:54
@Qard Qard self-assigned this Jan 12, 2026
@Qard Qard requested a review from ankrgyl January 12, 2026 23:54
@github-actions
Copy link

github-actions bot commented Jan 12, 2026

Braintrust eval report

Autoevals (model-flexibility-1768324125)

Score Average Improvements Regressions
NumericDiff 73.4% (+1pp) 3 🟢 1 🔴
Time_to_first_token 1.33tok (-0.06tok) 85 🟢 33 🔴
Llm_calls 1.55 (+0) - -
Tool_calls 0 (+0) - -
Errors 0 (+0) - -
Llm_errors 0 (+0) - -
Tool_errors 0 (+0) - -
Prompt_tokens 279.25tok (+0tok) - -
Prompt_cached_tokens 0tok (+0tok) - -
Prompt_cache_creation_tokens 0tok (+0tok) - -
Completion_tokens 19.3tok (+0tok) - -
Completion_reasoning_tokens 0tok (+0tok) - -
Total_tokens 298.54tok (+0tok) - -
Estimated_cost 0$ (+0$) - -
Duration 3.14s (-0.42s) 140 🟢 79 🔴
Llm_duration 2.58s (-0.22s) 106 🟢 13 🔴

@Qard Qard force-pushed the model-flexibility branch 3 times, most recently from 48320a5 to 91f19f8 Compare January 13, 2026 00:00
Copy link
Collaborator

@ibolmo ibolmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Qard Qard force-pushed the model-flexibility branch 3 times, most recently from 7d7b9da to 7f8c1fd Compare January 13, 2026 00:33
@Qard Qard requested a review from ibolmo January 13, 2026 00:35
This change allows users to configure which model to use as the default
for all evaluations, replacing the hardcoded gpt-4o default.

Changes:
- Add `defaultModel` parameter to `init()` in both JS and Python
- Add `getDefaultModel()` function to retrieve configured default model
- Update LLMClassifier and RAGAS scorers to use configurable default model
- Update documentation with examples for different use cases

This enables:
- Using different OpenAI models (gpt-4-turbo, o1, gpt-3.5-turbo, etc.)
- Using non-OpenAI models via Braintrust proxy (Claude, Gemini, Llama, etc.)
- Configuring once and having all evaluators use the preferred model

Example usage:
```javascript
init({
  client: new OpenAI({
    apiKey: process.env.BRAINTRUST_API_KEY,
    baseURL: "https://api.braintrust.dev/v1/proxy",
  }),
  defaultModel: "claude-3-5-sonnet-20241022",
});
```

Fixes #136

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@Qard Qard force-pushed the model-flexibility branch from 7f8c1fd to d616f67 Compare January 13, 2026 17:08
@Qard Qard merged commit 1ff945d into main Jan 13, 2026
7 checks passed
@Qard Qard deleted the model-flexibility branch January 13, 2026 17:10
@github-actions
Copy link

github-actions bot commented Jan 13, 2026

Braintrust eval report

Autoevals (main-1768324249)

Score Average Improvements Regressions
NumericDiff 72.5% (-1pp) 1 🟢 3 🔴
Time_to_first_token 1.34tok (+0.01tok) 50 🟢 68 🔴
Llm_calls 1.55 (+0) - -
Tool_calls 0 (+0) - -
Errors 0 (+0) - -
Llm_errors 0 (+0) - -
Tool_errors 0 (+0) - -
Prompt_tokens 279.25tok (+0tok) - -
Prompt_cached_tokens 0tok (+0tok) - -
Prompt_cache_creation_tokens 0tok (+0tok) - -
Completion_tokens 19.3tok (+0tok) - -
Completion_reasoning_tokens 0tok (+0tok) - -
Total_tokens 298.54tok (+0tok) - -
Estimated_cost 0$ (+0$) - -
Duration 2.86s (-0.28s) 105 🟢 111 🔴
Llm_duration 2.72s (+0.14s) 30 🟢 89 🔴

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

How to use an Anthropic model for evals without unsetting OPENAI_API_KEY?

3 participants